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Abstract — Computer-aided detection (CAD) systems are 
convenient for the automatic lung nodule detection in 
computed tomographic (CT) images, as the sheer volume of 
information present in CT datasets is overwhelming for 
radiologists to process. First, segmentation scheme is used as a 
preprocessing step for enhancement. Then, the nodule 
candidates are detected by Eigen value decomposition of 
hessian matrix and Multi -scale dot enhancement filtering. 
After the initial detection of nodule candidates using filtering 
technique, feature descriptors were extracted. The feature 
descriptor is refined using the process of wall detection and 
eradication. An Evolutionary Support Vector Machine 
(ESVM) is trained to classify nodules and non-nodules. The 
proposed CAD system is validated on Lung Image Database 
Consortium (LIDC) data. Experimental results show that the 
detection scheme achieves 98.3% sensitivity with only llfalse 
positives per scan. 

Index Terms — CT; Pulmonary nodule detection; CAD; 
Feature extraction; 


I. INTRODUCTION 

In this modern era the total number of deaths caused by 
cancer [10] is increasing day by day. Lung cancer is the most 
common and fatal cancer in the world. Usually, lung cancer 
does not cause symptoms early in the disease process, and is 
mostly diagnosed at a late stage in a clinical setting, when the 
probability of cure is rare. At the time of diagnosis, most 
patients are already present with advanced disease. It is 
expected that screening can detect lung cancer at an early 
stage and reduce mortality. The goal of CAD [12] is to assist 
the radiologists in increasing the scanning efficiency and 
potentially improving nodule detection. 

Generally the nodule detection system comprises of three 
steps namely lung segmentation, nodule candidate detection 
and classification. Several researchers have presented a 
variety of methods for segmenting [17] the lung volume from 
a pulmonary CT scan. The segmentation is usually carried 
out by thresholding. The various thresholding [2,5] schemes 
have been implemented. After thresholding, the lung volume 
is then extracted from the segmented images using 3D 
approaches [8]. 
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The lung volume is segmented without artifacts by 
3D-connected component labelling [6, 15]. The extracted 
lung volume needs to be refined to include juxta-pleural 

nodules. Subsequently, due to the complexity of these 
approaches, several methods have been presented for 
refining a lung mask. Recently, the application of a chain 
code representation over a lung mask was also proposed an 
attempt to correct the contours. 

In the segmented lung volume, nodule candidates have been 
detected using various methods. Here Eigen value 
decomposition of hessian matrix and Multi-scale dot 
enhancement filtering are applied to detect and segment 
nodule candidates. After nodule candidate’s detection, there 
are many false positives that require elimination. False 
positives are eliminated by feature extraction and 
classification techniques. 

The features are extracted from the detected nodule 
candidates. In this detection scheme Angular Histograms of 
Surface Normal feature [7] (AHSN) are extracted. Finally, 
nodules are detected with Evolutionary support vector 
machine classifier [20, 21] using the extracted feature, 
yielding minimal number of false positives. 

II. MATERIALS AND METHODS 

A. Proposed pulmonary nodule detection scheme 

The proposed nodule detection scheme comprises of three 
main steps namely Lung volume segmentation, Feature 
extraction and ESVM [20, 21] based classification. 



Fig -1 : Overall pulmonary nodule detection scheme 
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a) Lung Volume Segmentation 

Lung volume segmentation [17] is the pre-processing step in 
detection process. It comprises of three steps namely 
Thresholding, Lung region extraction and Contour 
correction [14]. 

(1) Thresholding 

Thresholding is performed primarily in lung volume 
segmentation to discriminate low-density regions from 
high-density regions. The high-density regions principally 
comprise of the body surrounding the lung cavity, whereas 
the low-density regions enclose the lung cavity, the air 
neighboring the body, and other low-intensity areas. The 3D 
volume of a CT scan is indicated as Im(x, y, z), where the x 
and y indices signify the slice coordinates, and z denotes the 
slice number. The volume consists of the total number of Z 
slices, and each slice has dimensions of X x Y.A fixed 
threshold value has been used to segment the lung volume. 
As a result of thresholding initial lung mask Mi is obtained. 
MjQCY, Z) = IrnCXY, Z) < - 500 HU (1) 

The chest wall, blood, and bone are so dense thus the 
threshold value is selected as -500 HU. 

(2) Lung region extraction 

After thresholding the lung region is extracted by 3D 
connected component labelling. A 3D-connected component 
labeling is applied to ensure region connectivity over the 
thresholded volume M L . 3D-connected components are 
obtained by using an 18-connected neighborhood. The 18 
connectivity voxels is shown in fig 2, here black point is the 
center point, and the 18white points are the neighborhoods. 
The lung areas are chosen from labeled volumes based on 
size of their volume. The labeled volumes (L) are obtained. 
Air in the environs of the body is easily evacuated, because it 
is attached to the boundary of the volume. The largest and 
second-largest volumes in (L) as the lung region are selected. 
The computational difficulties are reduced by selecting the 
labeled volumes at the median slice. Since the median slice 
possess only a few other non-body components. During the 
volume selection utmost of the undesirable non -body 
components are neglected. Thus the air outside the body and 
gas in the intestine are removed. At this instant, the lung 
regions comprehend tiny holes, which are generally nodules 
are vessels. Then morphological hole-filling operation is 
performed, as these holes should be contained. 

The extracted lung volume are combined as follows 

Ml = LJ 4 (2) 



Fig-2 : 1 8 Connectivity voxels 


(3) Contour correction 

The contour-refined lung volume is obtained using a contour 
correction method [6] based on chain code analysis. The 
extracted lung masks volume (SI) is not even and does not 
contain juxta-pleural nodules, which may impact the system 
performance. To obtain a even lung mask and to add 
juxta-pleural nodules in the lung volume, contour correction 
is performed to the initial lung mask. In this case, a chain 
code representation is used to remove the critical section. The 
eight chain codes considered are: 0 , 45 , 90 , 135 , 180 , 
225°, 270° and 315°. 




Fig-3 : Contour correction using chain code representation 

The chain code representation used to remove critical section 
is shown in fig. 3. The noise in the contours of initial lung 
mask is eliminated by Gaussian smoothing filtering. The 
critical section is derived from its respective critical points. 
These are identified by determining the transition of the 
angular direction of the contour. If the span between a pair of 
critical points is smaller than the conventional nodule 
diameter, this pair is selected for critical section correction. 
The next step is then to unite respective pairs of critical 
points, and to stuff the critical sections. 

b) Nodule candidate detection 

The nodule detection is the vital step in the overall detection 
scheme, and the CAD systems performance mainly depends 
on the accuracy of nodule candidates detected. In this method 
nodule candidate detection method, local structure 
information of each voxel determined by Eigenvalue 
decomposition of Hessian matrix and nodule candidates are 
detected by multi-scale dot enhancement filtering. 

(1) Eigenvalue decomposition of Hessian 
matrix 

Local structure information is derived from the eigenvalue 
and eigenvectors which is obtained as a result of Hessian 
matrix decomposition. Gradient information relates the 
structure of objects in an image, identifying features or 
providing basic information for computer vision application. 
Hessian is a square matrix of second-order partial derivatives 
of a function. The Hessian matrix, also referred to as the 
second-moment matrix, it depicts the local curvature of a 
function of many variables. Gaussian smoothing over a 
number of scales is applied over the 3-D image to eliminate 
noise, prior to gradient calculation. 
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The Hessian matrix H is decomposed using eigenvalue 
decomposition, yielding three eigenvalues (^ 1 ,A 2 andJl g ) and 

eigenvectors ( & 1 ,& 2 inde 1 ).The Hessian matrix HM is 
decomposed by the following equation: 

HM = T k 2 e 2 e I + ^3 e 3 e I (3) 

Hence explicit structural information about the surfaceness, 
curvedness and pointedness are obtained. 


(2) Multi-scale dot enhancement filtering 


The dot enhancement filter is used to enhance spherical 

objects to identify nodules. The dot value for each and every 

voxel is defined as 

U.l 1 




IT | 


(4) 


Where A ! , A z ■■ and A g (Pi I > are three eigenvalues 


derived from the Hessian matrix. 

The diameter of the nodules is assumed to be in the range of 
the N discrete smoothing scales (t 7 n ) in the range of 

is calculated as 


17— 1 ^ 

7 


Where r = 



,and each scale has nodule 


diameter (4u). 


Here the detection scheme involves five smoothing scales 
in the nodule diameter in the range of [3mm, 30mm]. By 
evaluating the dot value in dot enhanced image, the location 
of nodule candidates are found. The nodule candidates are 
detected by means of using a threshold value in the 
dot-enhanced image. The threshold is obtained by averaging 
local maximum dot values. The threshold value varies for 
each and every scale. The position of the nodule candidates 
are detected based on the local maximum dot value. The 
image section is derived as the nodule candidate from the 
identified position. The dimension of the image section is 
d n X d n X d n in isotropic-sized voxels. The size of the image 

section in accordance with smoothing scale is given by 
relation 

+ B] (6) 

Where I is the interpolation element of each direction for 
isotropic-sized voxels, B represents the boundary pixels 
around the desired object, and braces denote the ceiling 
function. The interpolated image section I s is used as input 
for feature extraction. 


c) Feature Extraction 

Features are useful information that describes characteristics 
of the nodule candidates. In the detection system, these 
features are used to train the ESVM. The detected nodule 
candidates are considered as nodules or non-nodules using 
the extracted feature information. The shape based features 
[3] are extracted. Features are constructed from surface 
constituent (surface saliency and surface normal vector) that 
are obtained through eigenvalue decomposition of the 
Hessian matrix H. The shape based descriptor relates the 
shape of the desired object based on the orientation 
probability of the surface normal. Hence, the surface saliency 
and surface normal vector is obtained from the input image. 
Thus to compute the surface normal, the eigenvalue 
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decomposition of the Hessian matrix H is utilized to every 
voxel in the desired image. 

The angular histograms of surfaceness information are 
obtained, to characterize the shape of desired object. These 
histograms indicate the angular direction of the surface 
normal vector on the respective surface saliency. The 
orientation of the surface normal vector is to be acquired 
before calculating the AHSN feature. The orientation is 
denoted by the altitude 0 and azimuth § in spherical 
coordinates: 

0 = cos~ l ( 9 z - j (7) 


p = tern 1 




The altitude 0 is diverse in the range [0, 180] degrees, and the 
azimuth § is motley in the range [0, 360] degrees. 



Fig-4 : separating the altitude 0 and azimuth § into 45° in the 
spherical coordinates. 

The shape features are extracted by arranging the formed 
surface normal vectors into bins which rifts the azimuth and 
altitude into 45° sections. Thereby an altitude orientation 0 
histogram with n bins, with each bin covering (180/ft) 
degrees is generated. Each sample in the image section is 
added to a histogram bin. The state of the histogram bin is 
weighted by its surfaceness saliency, and normalized by the 
sum of surfaceness saliencies in the image section. Likewise, 
the azimuth orientation § is quantized into n bins, with each 
bin covering (360/ft) degrees, and each sample in the image 
section added to a histogram bin [19] is weighted and 
normalized. Thus, the dimension of the feature descriptor is 
2ft, and the extracted AHSN feature is scale-invariant. Hence 
the shape of the desired object is derived using the AHSN 
feature descriptor. 

(1) Depuration of feature descriptor 

Wall detection and elimination technique is implemented to 
refine surface based feature descriptor. The presence of lung 
wall causes adverse impact in the nodule detection scheme. 
Wall elimination paves way for accurate nodule detection. 
The presence of walls may affect the shape feature descriptor; 
generally walls bear larger surface areas than other entities. 
The wall elimination method is used to detect non -isolated 
nodules. 

Initially it is essential to detect and eradicate walls .Walls are 
detected by finding the local maxima on AHSN. 
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Consequently connected component labelling is applied to 
voxels having similar normal vector orientations to the 
peaks. The surfaces with similar normal vectors are 
reconstructed. The reconstructed surface is the wall which is 
eliminated, if the surface is larger than the other parts of the 
lung. The above step is repeated until there is no wall. Thus 
the wall elimination method effectively eliminates 
unnecessary walls near the desired object. 

The wall identification and eradication algorithm is 
described in algorithm 1 

Algorithm 1 


1 : procedure 


»Removing walls in 


WALL ERADICATION (I s ) nodule candidates 


2. 

^surface } ' STD(I s ) 

3: X^AHSN (S^ rfa , € .. 

surface ) 

4: Nwall P 
5: reprise 


6 : {0 


max’ 


tfW) <- fm d high Peak(x) 


7: 


la< labelling ( ? 
® max* ) 


8: for all lc la do 
9: if region (1 )>T r then 

10. < ^wa]] ^ ^ 

1 1 : end if 
12: end for 
13: 

X<— Masked AHSN(Sa Jljr f: a „ :; E 5 

surface’ -AvaDi) 

14: til there are no high 

Peaks in X 

15: return X 


Image section. 
»Hessian matrix 
decomposition 
» Derive AHSN feature 
Descriptor 


»Identify the walls 


»Label connected 
components that have 
alike orientations to 
Surface normal to 

I ® m ax - 1 '-Pmax } 


» Derive AHSN feature 
descriptor without wall 


area 


» Wall-eradicated 
AHSN Feature 
descriptor 


16: end 


d) Evolutionary support vector machine 
classifier 

In order to classify the pulmonary nodules, Evolutionary 
support vector machine classifier is utilized. SVMs are 
supervised learning models with associated learning 
algorithms that evaluate data and distinguish patterns. 
Support Vector Machines (SVM) can be trained with 
different Kernel types along with various selection of 
parameters. A generic form of the SVM uses the Radial Basis 
Function (RBF) as the Kernel. For the traditional SVM, the 
values of control parameters such as box constraint C and 
kernel parameter y must be specified. Contrary to the SVM, 
the input of ESVM is composed of the training data only 
where the values of control parameters would be tuned 
automatically by Genetic Algorithm(GA). 


(1) Classifier training 

During the classifier training stage, the dataset comprising of 
feature vectors is constructed. A balanced dataset is 
constructed to attain better training. The dataset is balanced 
by selecting N/2 nodules and N/2 non-nodules randomly 
from the detected nodule candidates. The balanced dataset is 
divided into training and testing datasets to validate the 
classifier. The training dataset, X = { (x i? yO}[Li, is created by 

selecting N nodule candidates, with each training data pair 
consisting of an input feature vector and its corresponding 
known desired class. 

High performance of ESVM takes place primarily from 
optimization of parameter setting of SVM. 

Radial basis function kernel is given by 

K r {X t ,Xj) = expiytiX' - xpl) 

ESVM has a model selection tool using the RBF kernel for 
the box constraint C and kernel parameter y and optimizes 
parameters (C, y) to improve performance. ESVM makes use 
of GA in optimizing system parameters by creating an 
efficient GA chromosome representation as well as an 
intelligent crossover operation. The procedure of ESVM is 
given as follows: 

1) Initialize a random population of chromosomes. 

Each chromosome consists of a pair of values (y, C). 

2) Fitness for a chromosome is defined as the 


F i tness = 


(ss nsitivity+speaf ic ity+cccuraeyl 

a 


3) A new generation is created by the following 
procedure. 

a. The best chromosome is copied to the next 

generation 

b. Replication with a probability Pr a 

chromosome is selected with a probability 
in proportion to its fitness value and it is 
copied to the next generation. 

c. Crossover with a probability Pc, two 

chromosomes are selected with roulette 
wheel selection method in proportion to 
the fitness values. A new chromosome is 
created in the next generation by 
combining the values. 
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d. Mutation, a chromosome is selected with a 
probability P m in proportion to its fitness 
value and a random change is made to 
create a new chromosome in the next 
generation. 


4) Fitness values are calculated for the new generation. 

Step 3 is repeated till convergence. 

5) The best chromosome of the final generation is 

selected as the parameter for the ESVM. 


For ESVM training, the nonlinear separating hyperplane 
with maximal margin in high dimensional space is 
automatically adapted using the kernel. In this manner, the 
solution space is refined and converges to the 
optimal/near-optimal solution. Training is stopped if the 
optimized maximal margin hyperplane is obtained for all 
input training vectors. ESVM parameters are optimized as 
shown in Fig. 5 



Fig-5 : ESVM-Parameters optimization 


(2) Nodule detection and classifier validation 

After training to obtain the class predicted from the test data, 
the input features derived from nodule candidates is provided 
to the classifier. The trained classifier will predict a class by 
considering only the input feature vectors. 

The ESVM finds the maximal margin hyperplane in the 
higher-dimensional space of the input feature vector in the 
training process. The nodules are then separated from 
non-nodules by the maximal margin hyperplane in the 
feature space. Hence the nodules are then separated from 
non-nodules. 

The performance measures are given by, 
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TPR = 
FPR=- 


7F 


7F+FN 

FF 


FF+7N 


Accuracy = 


7F+7X 


7P+FP+FN+7N 


SPC=1-FPR 


( 10 ) 

( 11 ) 

( 12 ) 

(13) 


Where TP and FN are the number of nodules classified as 
True Positive and False Negative, respectively. SPC indicates 
the abbreviation of Specificity. The True Positive Rate (TPR) 
represents the number of correctly predicted positives 
divided by the total number of positive cases. The False 
Positive Rate (FPR) is the number of negative cases predicted 
as positive cases divided by the total number of negative 
cases. The accuracy is the proportion of true results in the 
population. Specificity (SPC) denotes the probability of a 
negative test given that the patient is well. Table 1 shows the 
overall performance of the nodule detection. 


III. RESULTS AND DISCUSSION 

The above section presents the implementation specifics and 
a number of experimental results at each and every stage of 
the CAD system. Primarily the implementation details and 
results obtained during lung volume segmentation and 
nodule detection are discussed. Consequentially the 
implementation details and performance of the entire 
classification scheme is furnished. 

A. Database and Imaging Protocol 


The Lung Image Database Consortium [1,4] (LIDC) 
database is a publicly available database of thoracic CT scans 
that serves as a medical imaging research resource. The 
dataset comprises of 2114 slices, and the nodule diameter 
ranges from 3 mm to 30 mm. There were about 200 slices per 
scan, and each slice is 512 pixels x 512 pixels, with 4096 
gray-level values in HU. The pixel size in the database 
ranged from 0.5 mm to 0.76 mm, and the reconstruction 
interval ranged from 1 mm to 3 mm. In the above dataset, 
four radiologists reviewed each scan and drew outlines for 
nodules 3.0 mm or larger in effective size. The ground truth 
was then established in a blind reading, which was followed 
by an unblinded reading sessions 


B. Lung volume segmentation and nodule candidate 
detection 

In the beginning, the lung volume is segmented by 
thresholding and 3-D connected component labelling. The 
results of each and every stage of the lung volume 
segmentation for distinct lung slices is shown in Fig. 6. The 
input CT images are shown in Fig. 6(a). Initially low density 
regions are parted from high density regions based on the 
threshold values to derive the segmented lung volume. Fig. 
6(b) shows the thresholded result of for every lung region. 
The thresholded outcome comprises of undesirable elements 
(air outside the body and gas in the intestine). In order to 
abstract lung region, 3-D connected component labelling to 
the thresholed image. The lung region extracted is shown in 
Fig. 6 (c). The extracted lung regions have certain flaws such 
as holes and critical sections which is eradicated by means of 
contour correction. The holes are eliminated by hole filling 
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operation. Contour correction is done to embrace 
juxta-pleural nodules in the critical section. This modifies 
the critical section by joining pairs of critical points that are 
separated by a distance of less than 20mm. Contour corrected 
results for every input slice are shown in Fig. 6(d). 

Sequentially the nodule candidate detection is executed on 
the segmented lung volume. The nodules are detected by 
enhancement of spherical objects by means of dot 
enhancement filtering. Totally five smoothing scales are 
used, 0.75, 1.33, 2.37, 4.21 and 7. 5. The resulting image 
section of varying sizes 5,8, 12, 19, and 32 are obtained. The 
block is interpolated to isotropic voxel resolution of 1mm. 



F^ise positive t$. te (fPR) 
Fig-6: ROC curve ofSVM 
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descriptor is more precise and constant as the Hessian matrix 
recapitulates the prevalent directions in a specified 
neighbourhood of a point. The dimension of the AHSN 
feature is not as much of conventional methods. The shape 
based feature gives better results after applying refinement 
technique. 

The 3-D shape based feature descriptor is applied to the 
nodule detection scheme. The classification is performed by 
splitting the training and testing datasets of three different 
ratios. In ESVM the box constraint C and kernel parameter y 
are obtained via genetic optimization. For the optimization 
the parametrical values for producing the new generation are 
set as: P r =0.2, P c =0.6 and p m =0.2. Upon evolutionary 
computation the values of box constraint and kernel 
parameter values obtained in following intervals (3, 15) and 
y (4, 10). Fig-6 shows the receiver operating characteristics 
curve (ROC), indicating performance of the SVM with 
respect to three ratios of classification data. The inclusive 
performance of the nodule detection system is assessed for 
entire detected nodule candidates. Fig- 8 shows the receiver 
operating characteristics curve (ROC), indicating 
performance of the ESVM with respect to three ratios of 
classification data. The above detection scheme reveals 
durable and valid performance in detecting nodules. 


ROC for classification for E-Svm 



Fig- 8: ROC curves of ESVM 

Table- 1: Overall performance results of the proposed CAD 
system 


Ratio 

Accuracy(%) 

Specificity(%) 

Sensitivity(%) 

20-80 

51.4 

21.4 

78.7 

50-50 

61.3 

38.6 

84 

80-20 

87.5 

76.6 

98.3 


Fig-7 : Results of lung volume segmentation 
Hence the nodule candidates were detected with reduced 
number of false positives. A balanced dataset comprising of 
equal number of nodules and non-nodules is constructed. 

C. Feature extraction and classification 

The shape based feature descriptor is extracted from the 
image section, and walls are eliminated using feature 
refinement. AHSN features are extracted. The shape based 


System for different dimensions of AHSN features with 
ESVM classifier. 

Table 1 denotes the performance of the nodule detection 
system on applying angular histograms of surface normal 
(AHSN) feature. The detection system achieves 1 1 FPs per 
scan, with 98.3% sensitivity. 
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Table 2 shows the performance comparison of CAD scheme 
using SYM and ESYM. 


parameter 

SVM 

ESVM 

Accuracy (%) 

74.1 

87.5 

Specificity (%) 

66.6 

76.6 

Sensitivity (%) 

81.6 

98.3 


From these results, it is found the CAD system effectively 
reduces the number of false positive detections and maintains 
better sensitivity. 


IV. Conclusion 

In this paper, a computer-aided system for pulmonary nodule 
detection based on Evolutionary Support Vector Machine 
classifier is presented. The paper describes the outright 
design of the CAD system and illustrates a detailed 
performance analysis on publically available LIDC database. 
In order to detect the pulmonary nodules, the lung volume is 
segmented by thresholding and 3D-connected component 
labelling-based method. From the segmented lung volume, 
nodule candidates are detected by Eigen value decomposition 
of Hessian matrix and Multi-scale dot enhancement filtering. 
Next, the shape based feature descriptors were extracted from 
detected nodule candidates, and refined to eradicate walls. 
The refined feature descriptors were fed as input for ESVM 
to detect nodules. The detection system attains sensitivity of 
98.3% at 11 false positives per scan. 
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